What Substitutes Tell Us - Analysis of an "All-Words" Lexical Substitution Corpus
نویسندگان
چکیده
We present the first large-scale English “allwords lexical substitution” corpus. The size of the corpus provides a rich resource for investigations into word meaning. We investigate the nature of lexical substitute sets, comparing them to WordNet synsets. We find them to be consistent with, but more fine-grained than, synsets. We also identify significant differences to results for paraphrase ranking in context reported for the SEMEVAL lexical substitution data. This highlights the influence of corpus construction approaches on evaluation results.
منابع مشابه
Who evoked that frame? Some thoughts on context effects and event types
Lexical substitution is an annotation task in which annotators provide one-word paraphrases (lexical substitutes) for individual target words in a sentence context. Lexical substitution yields a fine-grained characterization of word meaning that can be done by non-expert annotators. We discuss results of a recent lexical substitution annotation effort, where we found strong contextual modulatio...
متن کاملExplorations in lexical sample and all-words lexical substitution
In this paper, we experiment with several techniques to solve the problem of lexical substitution, both in a lexical sample as well as an all-words setting, and compare the benefits of combining multiple lexical resources using both unsupervised and supervised approaches. Overall in the lexical sample setting, the results obtained through the combination of several resources exceed the current ...
متن کاملKU: Word Sense Disambiguation by Substitution
Data sparsity is one of the main factors that make word sense disambiguation (WSD) difficult. To overcome this problem we need to find effective ways to use resources other than sense labeled data. In this paper I describe a WSD system that uses a statistical language model based on a large unannotated corpus. The model is used to evaluate the likelihood of various substitutes for a word in a g...
متن کاملInvestigation into Human Preference between Common and Unambiguous Lexical Substitutions
We present a study that investigates that factors that determine what makes a good lexical substitution. We begin by observing that there is a correlation between the corpus frequency of words and the number of WordNet senses they have, and hypothesise that readers might prefer common, but more ambiguous words over less ambiguous but also less common ones. We identify four properties of a word ...
متن کاملThe Use of Lexical Bundles in Native and Non-native Post-graduate Writing: The Case of Applied Linguistics MA Theses
Connor et al. (2008) mention “specifying textual requirements of genres” (p.12) as one of the reasons which have motivated researchers in the analysis of writing. Members of each genre should be able to produce and retrieve these textual requirements appropriately to be considered communicatively proficient. One of the textual requirements of genres is regularities of specific forms and content...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014